feat(ast/estree): raw transfer (experimental)#9516
Merged
graphite-app[bot] merged 1 commit intomainfrom Mar 4, 2025
Merged
Conversation
Member
Author
CodSpeed Performance ReportMerging #9516 will not alter performanceComparing Summary
|
3893f4f to
9812974
Compare
Member
Author
|
Note: I've changed the NAPI CI task to build in release mode. The 40,000 Test262 tests take a long time to run, so building in release mode helps bring it down to a more reasonable run time. Hopefully I can speed up these tests in a follow-on PR. |
4dab08b to
e4459e3
Compare
1c49226 to
3e256d7
Compare
This was referenced Mar 3, 2025
3e256d7 to
30732cc
Compare
Contributor
Merge activity
|
First version of raw transfer (#2409). Provides a solid speed-up transferring data from Rust to JS. Further iterations will speed it up further. Tests check that output via raw transfer matches output via JSON transfer exactly for all of Test262 which Acorn is able to parse. It should also match for TypeScript and JSX, but that's not covered by tests as yet. However, I think we should consider this experimental for now because there are a few rough edges (discussed below). Therefore I've put it behind an "experimental" flag: ```js const ret = parseSync(filename, code, { experimentalRawTransfer: true }); console.log(ret.program); ``` ### How it works * JS creates an `ArrayBuffer` and passes it to Rust. * Rust creates an `Allocator` using that buffer as its backing memory. * Rust parses the AST into that allocator (including comments). * Rust also converts module record and errors into arena types, and writes them into the allocator. * Rust writes into the end of the buffer the offset at which data begins in the buffer. * Control passes back to JS. * JS code decodes data in the buffer, and creates JS objects fitting the ESTree shape. There is *no* serialization step on Rust side. There is no JSON encoding or decoding involved. These are the main sources of the speed gain. ### Preconditions The reason all this works is that all AST types (and other transferred types) are `#[repr(C)]`. So the memory layouts of those types are specified, and can be statically calculated. `oxc_ast_tools` does those layout calculations, and generates the JS-side deserializer code based on its knowledge of what offsets struct fields are at, what the discriminants of enums are etc. ### Rough edges There are a few rough edges. #### `Allocator::from_raw_parts` This PR adds a method `Allocator::from_raw_parts`. It creates a `bumpalo::Bump` that uses an existing block of memory as the allocator's backing memory. This ability is required, but it's not an API that `bumpalo` offers (and the maintainer was not willing to add it). The implementation of `Allocator::from_raw_parts` is extremely hacky. It depends on internal implementation details of `bumpalo` which are not specified. And the method used to determine the memory layout of `Bump` depends on unspecified details of Rust's memory model. So, while it does seem to work in practice, it is, strictly speaking, UB. It could break in a future Rust version, or with esoteric compiler flags e.g. [`-Zrandomize-layout`](https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/randomize-layout.html). And it could break if we update `bumpalo` from the version we're currently using. For this reason, this PR pins the version of `bumpalo` in `Cargo.toml`. I think this OK for now, but it's unpleasantly fragile. We can resolve all these problems by replacing `bumpalo` with our own arena allocator (which I think we should do anyway, for other reasons). In the meantime, `Allocator::from_raw_parts` is behind a cargo feature `from_raw_parts`, to avoid it being used anywhere else in our codebase. #### Unspecified type layouts As noted above, all AST types are `#[repr(C)]`, so their layout is specified and stable. There are a few types which are outside of our control, though: 1. `Vec<T>`. We use `allocator_api2::vec::Vec`, which is not `#[repr(C)]`. 2. `&str`. I don't believe the layout of this type is specified. 3. `Option<T>`. For production-grade stability, we need to try to work around these. `Vec` - we should replace `allocator_api2::vec::Vec` with our own `Vec` type. This will also allow us to reduce its size (https://github.com/oxc-project/backlog/issues/18). `&str` - again, we need our own string slice type, to work around the problem of lone surrogates (#3526) and to make it more efficient (oxc-project/backlog#46). We can make its memory layout stable at the same time. `Option<T>` is tricky. We don't want to replace Rust's `Option` because of the niche optimization benefits it gives. I'm not sure this one is 100% soluable, but Rust gives at least *some* guarantees about the layout of `Option`. Maybe we can avoid using `Option` in the AST in ways which go outside that specification. #### Large buffers For speed, raw transfer requires the entire AST to be in a single contiguous memory region, and for the start of that region to be aligned on 4 GiB. JS does not support 64-bit integers, so offset calculations are much cheaper when the buffer is aligned on 4 GiB and no larger than 4 GiB - because then all pointers have the same value in their top 32 bits. So the pointer can be treated as a 32-bit value (bottom 32 bits only). JS can handle 32 bit integers no problem. When creating a large buffer on JS side, it *mostly* ends up aligned on a 4 GiB boundary anyway, but occasionally it doesn't. So in order to ensure the buffer has at least 1 region within it which is aligned on 4 GiB, and 2 GiB in size, we have to create a 6 GiB buffer. I *think* this is OK. On systems with virtual memory, allocating 6 GiB only reserves 6 GiB of *virtual* memory. Physical memory is only consumed when the pages of that allocation are actually written to. But I may be missing something here, and memory exhaustion might be a danger. I think we need some real-world usage to find out. *Possibly* we could reduce the need for so much memory if JS deserializer called into a small WASM module to do offset calculations. WASM can work with `i64` values. Or there may be other solutions. #### Endianness Currently only little-endian systems are supported. Probably in practice this doesn't matter much, but it'd be ideal to cover big-endian too.
30732cc to
d55dbe2
Compare
This was referenced Mar 4, 2025
This was referenced Mar 4, 2025
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

First version of raw transfer (#2409).
Provides a solid speed-up transferring data from Rust to JS. Further iterations will speed it up further.
Tests check that output via raw transfer matches output via JSON transfer exactly for all of Test262 which Acorn is able to parse. It should also match for TypeScript and JSX, but that's not covered by tests as yet.
However, I think we should consider this experimental for now because there are a few rough edges (discussed below).
Therefore I've put it behind an "experimental" flag:
How it works
ArrayBufferand passes it to Rust.Allocatorusing that buffer as its backing memory.There is no serialization step on Rust side. There is no JSON encoding or decoding involved. These are the main sources of the speed gain.
Preconditions
The reason all this works is that all AST types (and other transferred types) are
#[repr(C)]. So the memory layouts of those types are specified, and can be statically calculated.oxc_ast_toolsdoes those layout calculations, and generates the JS-side deserializer code based on its knowledge of what offsets struct fields are at, what the discriminants of enums are etc.Rough edges
There are a few rough edges.
Allocator::from_raw_partsThis PR adds a method
Allocator::from_raw_parts. It creates abumpalo::Bumpthat uses an existing block of memory as the allocator's backing memory. This ability is required, but it's not an API thatbumpalooffers (and the maintainer was not willing to add it).The implementation of
Allocator::from_raw_partsis extremely hacky. It depends on internal implementation details ofbumpalowhich are not specified. And the method used to determine the memory layout ofBumpdepends on unspecified details of Rust's memory model. So, while it does seem to work in practice, it is, strictly speaking, UB.It could break in a future Rust version, or with esoteric compiler flags e.g.
-Zrandomize-layout. And it could break if we updatebumpalofrom the version we're currently using. For this reason, this PR pins the version ofbumpaloinCargo.toml.I think this OK for now, but it's unpleasantly fragile.
We can resolve all these problems by replacing
bumpalowith our own arena allocator (which I think we should do anyway, for other reasons).In the meantime,
Allocator::from_raw_partsis behind a cargo featurefrom_raw_parts, to avoid it being used anywhere else in our codebase.Unspecified type layouts
As noted above, all AST types are
#[repr(C)], so their layout is specified and stable. There are a few types which are outside of our control, though:Vec<T>. We useallocator_api2::vec::Vec, which is not#[repr(C)].&str. I don't believe the layout of this type is specified.Option<T>.For production-grade stability, we need to try to work around these.
Vec- we should replaceallocator_api2::vec::Vecwith our ownVectype. This will also allow us to reduce its size (#9706).&str- again, we need our own string slice type, to work around the problem of lone surrogates (#3526) and to make it more efficient (oxc-project/backlog#46). We can make its memory layout stable at the same time.Option<T>is tricky. We don't want to replace Rust'sOptionbecause of the niche optimization benefits it gives. I'm not sure this one is 100% soluable, but Rust gives at least some guarantees about the layout ofOption. Maybe we can avoid usingOptionin the AST in ways which go outside that specification.Large buffers
For speed, raw transfer requires the entire AST to be in a single contiguous memory region, and for the start of that region to be aligned on 4 GiB.
JS does not support 64-bit integers, so offset calculations are much cheaper when the buffer is aligned on 4 GiB and no larger than 4 GiB - because then all pointers have the same value in their top 32 bits. So the pointer can be treated as a 32-bit value (bottom 32 bits only). JS can handle 32 bit integers no problem.
When creating a large buffer on JS side, it mostly ends up aligned on a 4 GiB boundary anyway, but occasionally it doesn't. So in order to ensure the buffer has at least 1 region within it which is aligned on 4 GiB, and 2 GiB in size, we have to create a 6 GiB buffer.
I think this is OK. On systems with virtual memory, allocating 6 GiB only reserves 6 GiB of virtual memory. Physical memory is only consumed when the pages of that allocation are actually written to.
But I may be missing something here, and memory exhaustion might be a danger. I think we need some real-world usage to find out.
Possibly we could reduce the need for so much memory if JS deserializer called into a small WASM module to do offset calculations. WASM can work with
i64values. Or there may be other solutions.Endianness
Currently only little-endian systems are supported. Probably in practice this doesn't matter much, but it'd be ideal to cover big-endian too.